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Abstract — In the first work of this series [1] it was shown that the conformational space of 
a molecule could be described to a fair degree of accuracy by means of a central hyperplane 
arrangement. The hyperplanes divide the espace into a hierarchical set of cells that can be 
encoded by the face lattice poset of the arrangement. 

The model however, lacked explicit rotational symmetry which made impossible to distinguish 
rotated structures in conformational space. This problem was solved in a second work [2] by 
sorting the elementary 3D components of the molecular system into a set of morphological classes 
that can be properly oriented in a standard 3D reference frame. 

This also made possible to find a solution to the problem that is being adressed in the present 
work: for a molecular system immersed in a heat bath we want to enumerate the subset of cells 
in conformational space that are visited by the molecule in its thermal wandering. 
If each visited cell is a vertex on a graph with edges to the adjacent cells, here it is explained 
how such graph can be built. 
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I. Introduction 

Molecular dynamics simulations (MDS) are an essential tool for the modeling of large and very 
large molecules, it gives us a precise and detailed view of a molecule's behaviour [3]. However, it 
has two limitations that hamper many practical applications: it is a random algorithm, as such 
it does not perform a systematic exploration of molecular conformational space (CS); and that 
currently, the output from an MDS represents only a very small fraction of the volume spanned 
by the system in CS. 

Here it is presented a complementary approach that locally is less precise but that can encompass 
a broader view of CS. It consists in dividing the CS into a finite set of cells, so that the only 
knowledge we seek about the system is whether it can be located in a given cell or not. 

As was extensively discussed in ref. [1] the partition is a variant of the An partition [4-5]: a 
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central 2 arrangement of hyperplanes that divides CS into a set of cells shaped as polyhedral 
cones, such that for a molecule with N atoms we have (Nl)^ cells. The set of hyperplanes is 
also a Coxeter reflection arrangement: the arrangement is invariant upon reflection on any of the 
hyperplanes. 

This structure has three important properties [1]: 

1. Associated with a Coxeter arrangement there is a polytope [5] whose symmetry group is 
the reflection group of the arrangement. The face lattice poset 3 of the polytope is a 
hierarchichal combinatorial structure that enables us to manage the sheer complexity of 
CS, since with simple codes we can describe from huge regions down to single cells. 

2. The information needed to encode any face in the polytope is a sequence of 3 x N inte- 
gers, which is a generalization of a structure known to combinatorialists as non-crossing 
partition sequence [5,6]. 

3. The construction is modular: if we consider the CS of two subsets of atoms from a system, 
the CS of the union set has an associated polytope that is the cartesian product of the 
polytopes 4 of the two subspaces, and its partition sequence is the ordered union of the two 
partition sequences [1]. 

The last one is particularly important since the CS of the whole system can be built from that 
of the parts, and the CS of a small number of atoms is very much smaller than that of the 
whole molecule and we can reasonably assume that it can be thoroughly explored by an MDS. 
Moreover, in merging the CSs corresponding to subsets of atoms the number of cells grows 
exponentially while the length of coding sequences grows only linearly. 

II. The basic construction 

Let (ei, ejy) be the standard basis in M. N , the convex hull of the endpoints of the vectors {ei} 
is a regular (N— l)-simplex : this gives a segment, an equilateral triangle and a tetrahedron in 
2, 3 and 4 dimensions respectively. 

For each edge of the regular (N— l)-simplex there is an hyperplane : Xi — x- s = 0, perpendicular 
to the edge and containing the other vertices, this hyperplane divides R N in three regions. A 
point x can be in one of these : 

• x\ > Xj the positive side, where the i th coordinate dominates the j th coordinate, 

• X{ < Xj the negative side, whith the j th the coordinate dominating the i th coordinate, 

• Xi = xj on the plane. 

This leads to a sign vector S for every point x e M. N , where the a th component X a e {+, — , 0} 
denotes wether x is on the positive side of H a , on its negative side or lies on H a . 

Also notice that the line x\ = X2 = ■■■ = xjv-i = %N is contained in every plane H-^, if the 
orthogonal complement to this line is U : x\+X2 + — + xn_\ + x^ = 0, we can define a partition 

2 That pass through the origin. 

3 The faces in the induced decomposition of the polytope ordered by inclusion. 

4 If Pcl p and Q C 1' are polytopes the product polytope P x Q has the set of vertices (x, y) e W p+q where x 
and y are vertices of P and Q respectively. 



on U, known to combinatorialists as An-i [4-5], with the set of hyperplanes 7Yij = U n H^. For 
reasons that are explained below the points outside U are not relevant to our construction. 

The set of all points x e U having the same sign vector S form a cell in the decomposition of 
U induced by An-i, associated to this secomposition is the following important structure : the 
face poset, which is the set of all cells induced by .Ajv-i ordered by inclusion. The maximal cells 
(all (iV— l)-dimensional) are called regions and are shaped as polyhedral cones, the coordinates 
of the points in the interior of a region obey the relation : 

Xi 1 < Xi 2 < ... < Xi N _ 1 < Xi N (1) 

the dominance relations (1) between the coordinates can be encoded by the sequence 

(»i)(i 2 ) ... {i N -i){i N ) (2) 

thereafter referred as the cell dominance partition sequence (DPS), where the set of indices 
i a is a permutation of (1, 2, TV — 1, TV). Each index appears enclosed between parenthesis for 
reasons that will be made clear in the next section. 

Reflecting a point in general position on 7i\^ gives an image where the coordinates i and j are 
switched and the others are left unchanged. Multiple reflections of a point on the hyperplanes 
7i\] generate a set of N\ images which are the permutations of its coordinates. This leads to the 
fact that the (^) hyperplanes form a Coxeter reflection arrangement [7] whose symmetry group 
is isomorphic to the symmetric group SV of permutations of the set (1, 2, TV— 1, TV). 

The reflection group An~i is also the symmetry group of a polytope: the TV-permutohedron or 
IlAr_i [5], so called because its vertices are obtained by permuting the coordinates of the vector 
(1, 2, TV — 1, TV). The faces of the the permutohedron are polar to the cells of the hyperplane 
arrangement and the face lattices of both are isomorphic. 

For a molecule with TV atoms as the x, y and z coordinates are independent of each other [1] 
we have a An-i partition for each of them, that is An-i^ for the whole CS. As it has been 
emphasized in [1-2] the —1 is because of the translation symmetry : the conformations outside 
the hyperplane U correspond to translated 3D structures. 

The radial dimension in CS is also spurious: multiplying the coordinates of an arbitrary 3D 
conformation by a positive factor generates a set of points lying on a half-line starting at the 
origin. The partition An-i is central because that takes into account the scaling symmetry. 

An-i on the other hand does not take into account the rotation symmetry [2], the solution of 
this problem and its consequences will be discussed in sections IV to VII. 

III. The face lattice poset 

The combinatorial structure of the An-i face poset is the fundamental concept behind this work, 
it can be understood by studying a class of objects called tournaments which are directed 
graphs with N nodes [8], these are used to investigate the properties of permutations, so useful 
for characterizing the cells in CS. 

A permutation of a set of M elements can be represented by an acyclic, complete and labelled 
tournament (see fig. 1 for a description), where : 
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• The term acyclic means that the graph contains no directed cycles. 

• A graph is complete if there is always an arc between any two nodes, if an arc goes from i 
to j we say that i dominates j. The score of a node is the number of nodes it dominates. 

• Each node of the graph has a unique label which is a number between 1 and AT that 
distinguishes it from the other nodes. 

In what follows the term tournament refers exclusively to tournaments where the above qualifiers 
apply. 



a) A complete acyclic tournament corresponding to the permutation (3, 6, 1, 4, 2, 5) which is the score of each vertex 
plus 1, the indices in the dominance sequence of vertices (3) (5) (1) (4) (6) (2) correspond to the inverse permutation. 

b) The antisymmetric incidence matrix, the rows in the upper triangle form the sign vector. 

For a tournament with M nodes the following statements are true : 

I. In a tournament there is always a node called the sink that is dominated by every other 
node. 

Consider the last node of any maximal directed path, if an arc connects it to another node 
then either the path is not maximal or there is a cycle; if there were another sink it would 
be connected to the former and either it would dominate or be dominated. 

II. In a tournament there is always a node called the source that dominates every other node. 

III. Any subgraph of a tournament is also a tournament. 

Any subgraph from a complete graph is also complete, and it can contain no cycles otherwise 
they would also be present in the parent graph. 

IV. There is one maximal path that spans the graph. 

Consider the subtournament obtained by removing the source, then start the path with the 
arc that goes from the source to the subsource, and repeat the same step with the subgraph 
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Figure 1. 



until you reach the sink. The path obtained goes through every node since there are N — 1 
steps, and is maximal since skipping a subsource for another node shortens the path since 
the node is dominated by the subsource. 

V. The sequence of labels of the nodes visited by the maximal path is the dominance partition 
sequence. 

By the construction procedure the first node, the source, dominates all other nodes, the 
second dominates the remaining nodes and so on. 

Theorem 1. In a tournament the arcs between a set of consecutive nodes in the maximal path 
can be arbitrarily reversed and the resulting graph still be a tournament if the subgraph spanning 
the consecutive nodes is a tournament. 

Since the subgraph and its complement are tournaments they contain no cycles, thus a cycle 
must involve nodes between the subgraph and the complement, but this is not possible since by 
construction the set of consecutive nodes is dominated by the preceeding nodes in the maximal 
path and likewise it dominates the following ones. 

By V reversing an arc between contiguous nodes is equivalent to a transposition in the DPS. 

Theorem 2. In a tournament encoded by (ii)(«2) ••• (ia) ■■■ (ia+n-i) ■■■ (*jv-i)(*jv) the permu- 
tations in the set of n consecutive indices i a ... i a+n ^i give a set of tournaments that encode the 
vertices of an n-permutohedron. 

If we restrict ourselves to the n-dimensional subspace spanned by the coordinates (xi a , Xi a+n _ 1 ) 
the permutations of the indices above corresponds to the permutations of the coordinates of the 
vector (a, a + 1, a + n — 1) which are the vertices of a n n _i. 

Corollary. The n-permutohedron is a face o/TIv-i- 

Obviously since it is contained in the affine hyperplane Xi a +Xi a+1 +...+Xi a+n _ 1 = n(a+(n— 1)/2). 
This face is encoded by the DPS 

(il)(i 2 ) ••• {la ■■■ ia+n-l) ■■■ (^V-l)(iw) (3) 

that represents the set of n\ sequences that are permutations of the indices i a to i a+ra _i. 

Corollary. The sequence (i\)(i2) ••• (ia ■■■ ia+n-i) ■■■ (ip ■■■ ip+ m ~i) ■■■ (^iV-iX^/v) encodes the 
(n + m — 2)-face II n _i x n m _i. 

This can be seen from the definition given above of the product of polytopes. 

Thus the meaning of parenthesis in DPSs becomes apparent : each parenthesis enclosing a 
sequence of length n encodes an n n _i polytope, and the whole sequence encodes the product of 
all these polytopes. 

These sequences can be ordered by inclusion to form a face lattice poset, which is isomorph to 
the one obtained with the sign vectors, since like DPSs they are another encoding scheme for 
tournaments [1]. 

This is an important feature because it implies the modularity of the model: the face lattice of 
a molecule can be obtained as the product of the face lattices of subsets of atoms. 



IV. Enumerating the orientations of a simplex 



For a simplex with random morphology we define the set of vectors that run along the edges and 
their associated central planes (figs. 2a and 2b) 

ey = vi - v s , 1 < i < j < 4 (4) 
= {x e R 3 : e iy x = 0} (5) 

Each plane divides 3D space into positive and negative halves 



£+(x) = {x e M? : ey.x > 0} and = {a; e M 3 : ey.x < 0} (6) 

As for the regular tetrahedron described above (5) and (6) generate an .A3 partition of 3D space 
in 24 irregular shaped cells, fig. 2b. 




Figure 2. The A3 partition of a random simplex. 

a) The random simplex with the vectors centered at the origin. 

b) The partition of 3D-space by the planes Eij represented as intersecting disks centered at the origin, visible 3D 
cells are designated by their sign vector and 1-dimensional cells are labelled by the corresponding /... symbols (7). 

This partition has the following interesting property: assume for instance that the x axis of a 
central orthogonal reference system in general position lies entirely within the cell encoded by 

the permutation (3,1,4,2), or equivalently the sign vector (H 1 h), then the dominance 

relation V2 X < V4 X < v\ x < v% x holds for the x coordinates of the vertices of the simplex. 

This suggests a method for enumerating the cells in A3 that correspond to the different orien- 
tations 5 of the simplex : it suffices to enumerate the cells with the lowest dimensions, the more 
numerous (3, 3, 3)-dimensional cells can be easily obtained through the connecting paths in the 

5 A11 along this work the term orientation is used interchangeably with DPS and sign vector. 



face lattice. 



The 1-dimensional cells in ^3 are determined by the set of vectors perpendicular to the faces of 
the simplex and to pairs of opposite edges 

/123 = ei2 A e23 , /124 = ei2 A e24 , /134 = ei3 A 634 , ^234 = e23 A 634 , 

/12 = ei2 A e 34 , /13 = ei 3 A e 2 4 , /14 = eu A e 2 3 (7) 

their corresponding central planes will be designated and T\y 

If we take the sign of the scalar products between the sets of vectors (4) and (7) we obtain a 
matrix 
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that up to a sign reversal is an invariant [4,9], it is the same for any simplex whatever its 
morphology. The rows are the sign vectors of the 1-dimensional cells with the corresponding 
dominance partition sequence on the righ, these cells can be seen in fig. 2b where the labels /yk 
and /y are on top of the lines intersected by the planes £y, <?ik, £jk and £y, <?ki respectively. 

We start by enumerating the orientations of a reference system whose z axis is parallel to one 
of the vectors (7), /123 for example, the remaining axis x and y will be on the plane .F123, the 
problem is to determine how the £ys (5) divide this plane into 2-dimensional cells. In fig. 3 we 
can see the four possible 12-sector partitions that can be generated by the vectors ei2, ei3 and 
623 an d the perpendicular intersections of the planes £12, £13 an d £23 • This partition gives us 
only half of the sign vectors components, to obtain the remaining ones we need to introduce a 
morphological classification of simplexes. 



V. Morphological classification of simplexes 



For a given simplex, like the one in fig. 2a for instance, we compute the sign of the scalar products 
of the vectors (4) and (7) between them, this gives the following two tables 
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(9) 



The set of signs (9) refer mostly to angles between adjacent edges and dihedral angles between 
contiguous faces: +, and — are for acute, right and obtuse angles respectively. 




Figure 3. The four possible partitions of the plane Tm. 

Within figs, a to d the vector /123 points in the upward direction, the labels ei2, ei3 and e23 are over the lines that 
run along these vectors, and the corresponding perpendicular lines are the intersections with the planes £"12, £"13 
and £23 respectively. The label /y k means that the corresponding line runs along the projection of vector /ij k on 
F123. The labels / 12 4 and / 12 over the intersection of plane £12, for instance, is because /124 and /12 are contained 
in that plane, and reciprocally e i2 is contained in the planes JT 124 and Tn ■ 

All these lines converge at the origin and partition J-123 in 12 sectors : between the inner and outer circles are 
the sign vector components of ei2, ei3 and e23, for each sector they should be read from inside out in that order; 
within the inner circle there are the sign vector components of / 124 , /134 and / 234 respectively. 
The sectors are numbered from 1 to 12 as indicated in a. 

Thus the rough morphological characteristics of a simplex can be encoded in a 36 bit binary 6 se- 
quence : there are a total of 3936 sequences that correspond to geometrically realizable simplexes, 
these define the set of morphological classes A of labelled simplexes. We define the volume of a 

6 We exclude sequences harboring Os as they form a set of null measure. 



class as the set of cells it spans in A3 . 

It should be reminded that this classification has a graph structure, since geometrical deformations 
in a simplex from one class induce a transition to other classes thus establishing a connectivity 
between them; the precise structure of such a graph is of no utility in the present work, but the 
concept is important when we will introduce below the dynamical states of a simplex. 

The binary sequence (9) is instrumental in finding the partition of the planes perpendicular to 
1-dimensional cells, in our exemple it can be deduced from (9) that the partition of T\2 3 is the 
one of fig. 3c, since it is the only one that satisfies the relation 

(SIGN(e 12 .e 13 ),SIGN(e 12 .e 23 ), SIGN(e 13 .e 23 )) = (+-+) 

There are also the relations concerning vectors ei4, e 2 £ and 634 

(SIGN(e u .e 12 ),SIGN(eu.ei 3 ), SIGN(e u .e 23 )) = (+ + +) (10a) 
(SIGN(e 24 .e 12 ), SIGN{e 2A .e l3 ), SIGN(e 24 .e 23 )) = (--+) (10b) 
(SIGN(e u .e 12 ), SIGN(e 34 .e 13 ), SIGN(e 34 .e 23 )) = (--+) (10c) 

thus e' 14 , the projection 7 of ei4, must lie in sectors 2 or 3 by (10a); similarly e 24 and e 34 must be 
in sectors 6 or 7 by (10b) and (10c). These ambiguities can be resolved by set of relations 

(SIGN(e u .f 123 ),SIGN(e 24 .f 123 ),SIGN(e 34 .f 123 )) =(+ + +) (Ha) 
(SIGN(f 124 .f 123 ), SIGN(f lu .f 123 ), SIGN(f 2U .f 123 )) = (+ + +) (lib) 
(SIGN(f 12 .f 123 ), SIGN(f 13 .f 123 ),SIGN(f 14 .f 123 )) =(- + +) (11c) 

ei4 for instance, lies on T\ 24 and together with /124 stands above F123, by (11a) and (lib), this 
implies that SIGN(e l4 .f l24 ) = — . Repeating this procedure for /134 and /234, and for each of 
the vectors e24 and 634 we end up with 

(SIGN(e u .f' 124 ),SIGN(e u .f' lu ),SIGN(e u .f u )) = ( ) (12a) 

(SIGN(^.f^),SIGN(^.f' 13 ),SIGN(e' u .f^)) = ( ) (12b) 

(SIGN(e u .f' 12 ),SIGN(e u .f' 134 ),SIGN(e M .f^)) = (+--) (12c) 

(12a), (12b) and (12c) imply that e' 14 , e 24 and e 34 are to be found in sectors 3, 6 and 7 respectively, 
thus removing these ambiguities. 

There is one ambiguity though that cannot be resolved by the binary sequence (9) : H 24 runs 
through sectors 3 and 9 together with e 14 , and Hu runs through sectors 6 and 12 as e 24 j so we 
end up with two possible partitions of T\ 23 that are shown in fig. 4. 

As can be seen from fig. 4 each partition generates 12 2-dimensional cells and the same number 
in one dimension, by construction the lines along the 1-dimensional cells are never perpendicular 
to each other, as a consequence for an (x, y) reference system centered at the origin if one of the 
axis runs along the edge of a sector the other will be located inside a sector: rotating the axis 
system enables us to scan 12 (1,2,3) and 12 (2, 1,3) dimensional cells (see fig. 4). 

Thus for any orientation structure associated with a plane J 7 ,,,, a reference system with one axis 
perpendicular to the plane can be in 2 x 12 x 6 cells with dimensions any permutation of the 
sequence (3,2,1) in (x,y,z). This solves the problem of enumerating the cells with the lowest 
possible dimensions that correspond to an orientation of the simplex, the (3, 3, 3)-dimensional 
cells can be found from these through the connecting paths in the .A3 cell lattice poset. 

7 The ' superscript designates the projection of a vector on T... 
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Figure 4. The two possible orientation structures of T\2Z- 

The thick lines are the intersections of £y 3 with Tnz , the thin ones are lines along the vectors ey and e Vi . The sign 
vectors of the 2-dimensional cells lie inside the circle, the 1-dimensional ones are outside along the corresponding 
partition line, they should be read from inside out. An (X,Y) axis system has been superimposed on the first 
structure as a visual aid to show how the sectors can be scanned. 

Thus for any orientation structure associated with a plane J-.,., a reference system with one axis 
perpendicular to the plane can be in 2 x 12 x 6 cells with dimensions any permutation of the 
sequence (3,2,1) in (x,y,z). This solves the problem of enumerating the cells with the lowest 
possible dimensions that correspond to an orientation of the simplex, the (3, 3, 3)-dimensional 
cells can be found from these through the connecting paths in the ^,3^ cell lattice poset. 

VI. The conformational space of a simplex 

We have seen that the binary sequences (9) cannot define unambiguous partitions of the planes 
T„, : for each there can be between 1 and 3 possible orientation structures, and between 1 
and 24 for each in a given class only a fraction of the combinations between the different 
orientation structures, one from each plane, give geometrically realizable simplexes. 

To remove ambiguities we need to define a set B of morphological classes such that for each one 
the range of geometrical variation only allows one orientation structure per T.... An empirical 
Monte Carlo calculation yields a total of 125712 classes of labelled simplexes, a class A has 
a number of subclasses B that goes from a minimum of 1 up to a maximum of 220. These 
morphological subclasses have the remarkable property that for a 3D conformation any cell in 
the volume can be reached through a rotation, which is is an obvious consequence of the one to 
one correspondence between T„, planes and orientation structures. 

Thus a class A can be decomposed into a set of subclasses B, that can be unambiguously oriented 
in a standard 3D reference frame, and its volume in CS is simply the union of the volumes of its 
subclasses. 
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VII. The orientation structures 



To achieve a morphological classification of simplexes we need to know how many classes of 
orientation structures there are, since the classes A decompose into subclasses B and each of 
these is determined by 7 orientation structures. 

A first classification concerns the circular order of the vectors ejj in the plane T a . This can be 
deduced from the set of signs (9), for instance : by (4) and (7) the shortest circular path going 
through e 12 , e 13 and e 23 must be less than ir, and it runs counter-clockwise if the sign of T^z-Fa 
is +. 

This exemple leads to the general solution that was discussed in [2] : the 7 vectors (7) define 
a central partition dual to A% [9] that divides the 3D space in 32 cells. The sign vector of the 
cell that contains T a defines the sense of the shortest circular path that connects the projected 
vectors in the 7 ordered sets {e' 12 , e' 13 , e 23 }, {e' 12 , e' M , e 24 }, {e' 13 , e' 14 , e 34 }, {e' 23 , e 24 , e 34 }, {e' 12 , e 34 }, 
{e 13 ,e 24 } and {e 14 ,e 23 }. This generates a set of 7 constraints from which the circular order of 
the e^s in T a can be deduced, making a total of 32 possible circular orientations. 

As can be seen in fig. 4 on the plane T a each ejj contributes a total of 4 separations between 
sectors at periodic intervals of 90° each comprising exactly 6 sectors, on the other hand there 
are two classes of separations : either a line along the vector ejj or the intersection of a plane 

7Yij, in an interval of 90° the possible distributions of the two separators amounts to a total of 2^ 
combinations. This makes 1024 classes of orientation structures like those in fig. 4, among these 
48 appear to be not geometrically realizable since they are not found in any class B. 

VIII. Determination of the graph of cells 

Most often in mesoscopic models of biological macromolecules atoms are represented as point-like 
structures surrounded by an atomic force field [10,11], thus any four atoms are the vertices of 
a 3-simplex. Also for a molecular system with N atoms an order relation can be defined by 
numbering its atoms from 1 to N, so that 3-simplexes can be designated as a 4-tuple of ordered 
integers which are the numbers of its atoms. 

Beyond the orientation problem, the classes A and B bring the possibility of analizing the dy- 
namics of a molecular system in terms of discrete entities, the range of morphological variation 
for simplexes within a molecule can be explored in molecular dynamics simulations (MDS) and 
the results can be summarized as follows [2,12] 

• 90% of simplexes in a structure evolve within less than 20 classes A. 

• The maximum variation observed is somewhat less than 200 classes, about 5% of the total. 

This result opens up the possibility of determining the set of geometrically accesible cells in the 
CS of a molecular system. 

The CS of a simplex has a total of 13824 cells and, typically, the volume of a class A is about one 
third of that number, much less if we exclude structures that can be derived through a rotation. 
This volume is very small when compared to the huge number of cells spanned by a molecular 
system, and it can be reasonably assumed that the volume of a simplex can be scanned by a 
molecular dynamics run. What cannot be scanned by a simulation is the set of structures that 
arise by combining the local movements. 



MDSs can be used to determine the subgraph of classes spanned by every simplex, and the volume 
of the molecular system in CS can be obtained by progressively merging the CS of individual 
simplexes. As we were able to determine the different orientations of a simplex this process can 
be done excluding redundant rotated structures. 

Before proceeding further let us show with a simple exemple the basic operations that are involved 
in the process of merging CSs. If we have two adjoining simplexes S a and Sp represented 
by the tetrads {14,33,82,86} and {14,82,86,91} respectively (notice that their common faces 
correspond to the vertices (v\, vs, V4) and (vi, V2, V3)), if the 3D structure of S a is in a cell encoded 
by the dominance partition sequence 

((82)(14)(86)(33), (33) (82) (86) (14), (86)(14)(33)(82)) (13) 

then the set cells in CSp geometrically compatible with (13) will be those whose DPS contains 
the pattern 

((82)(14)(86), (82)(86)(14), (86)(14)(82)) (14) 
Thus a cell in CSp with DPS 

((82)(91)(14)(86), (91)(82)(86)(14), (86)(14)(91)(82)) (15) 
can be merged with (13) and generates the set of 4 cells in CS a x CSp 

((82)(91)(14)(86)(33), (33 91)(82)(86)(14), (86)(14)(33 91)(82)) (16) 
which corresponds to a square face in the polar polytope. 

To calculate the graph of the geometrically accesible cells we begin by picking an arbitrary 
reference simplex, preferably one with low morphological variation, and arbitrarily choose an 
orientation among those available, this will be the simplex on level 1, the simplexes adjacent to 
this one form the level 2, and so on. Since adjacent simplexes in a 3D structure share three 
vertices the shortest adjacency path between any two of them has at most length 4, so we end 
up with simplexes in 5 levels. 

We need not to include every simplex from the molecule to perform a useful calculation, but 
there is the minimum requirement that every pair of atoms from a total of (^) should be present 
at least once in a 4-tuple, otherwise the DPSs could not be determined. 

The calculation can be done through the following procedure : 

1. Start at level 1. 

2. From any simplex in level n we select the compatible orientations in the adjoining simplexes 
in level n + 1. 

3. From any simplex in the level n + 1 we select compatible orientations on the adjoining 
simplexes at the same level. 

4. If n < 5 we go to step 2 and continue with level n + 1. 

A link is created between any two compatible orientations in adjacent simplexes. This is done in 
two steps: 

1. If the simplex in the lower level has not yet been visited any orientation compatible with 
those from the simplex in the upper level is selected. 



2. Otherwise any orientation that has not been selected is discarded. And likewise an orienta- 
tion that fails to form a link with an adjacent simplex is discarded because of geometrical 
inconsistency. 



The implementation of this procedure as an efficient computer algoritm requires that the CS 
of a class A simplex be quickly searched for orientations compatible with those from the ad- 
joining simplexes, these can be obtained from the set of orientation structures available to each 
1-dimensional cell T„. (7). This requirement can be fulfilled by building a hash table from where 
the DPSs like (15) can be retrived, such table has the following set of entries : 

1. the number of the orientation class : from 1 to 976, 

2. the connecting face, numbered from 1 to 4, 

3. the 1-dimensional T„, cell (7) corresponding to the orientation structure, numbered from 1 
to 7, 

4. the chirality of the simplex: right or left-handed, 

5. the pattern (14), of a total of 216 possible patterns. 

IX. Conclusion 

The aim of the present work has been to bring the sheer complexity of molecular conformational 
space to tractable dimensions, by building a structure that encodes the set of geometrically acce- 
sible 3-D-conformations of a thermalized molecule, and putting it in a compact and manageable 
code. The price to pay to achieve this result is the loss of the absolute precision over the local 
3-D-conformations of molecular structures [1], but this has no concern with this work since we 
only seek to obtain a global view of conformational space. From this point of view the present 
formalism may be a useful complement of molecular dynamics simulations that in the detailed 
exploration of small regions is unexcelled. 

What remains to be done is to explore the graph of cells with a Hamiltonian functional over a 
force field and perform energy optimizations. It should be emphasized that as a Hamiltonian is a 
function of distances between atoms the present structure offers the possibility of calculating the 
energy over entire regions of CS, since the interatomic distances can be enumerated for a set of 
cells and in this case the energy function is nothing else than an integral over a rational function. 
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